[COPP] Fixed COPP test - reload logic #5651

ppikh · 2022-05-13T14:19:11Z

Signed-off-by: Petro Pikh [email protected]

Description of PR

Fixed COPP test - reload logic
Old logic caused time to time failures in first test in class due to traffic loss(looks like DUT not ready after reload). Issue introduced in PR: #5564

Summary: Fixed COPP test - reload logic
Fixes # (issue)

Type of change

Bug fix
Testbed and Framework(new/improvement)
Test case(new/improvement)

Back port request

201911
202012

Approach

What is the motivation for this PR?

Fix test failure

How did you do it?

See code

How did you verify/test it?

Executed tests from file sonic-mgmt/tests/copp/test_copp.py

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Old logic caused time to time failures in first test in class due to traffic loss Signed-off-by: Petro Pikh <[email protected]>

liat-grozovik · 2022-05-14T12:35:10Z

/azp run Azure.sonic-mgmt

azure-pipelines · 2022-05-14T12:35:18Z

Azure Pipelines successfully started running 1 pipeline(s).

liat-grozovik · 2022-05-14T12:35:39Z

@dgsudharsan culd you please help to review?

yejianquan · 2022-05-16T03:27:43Z

Hi @liat-grozovik , I didn't understand that,
if the problem is DUT is not ready after config_reload, a safe_reload flag lets DUT wait for the critical services fully started.
Why does removing it helps to solve the issue?

ppikh · 2022-05-16T07:09:32Z

Hi @yejianquan
If we use safe_reload - time to time on some DUTs we have failed first test case.
For me it looks like DUT is not ready but we started to send traffic - as result traffic failed.
When I removed argument "safe_reload" - tests passing.

yejianquan · 2022-05-16T09:39:40Z

Hi @yejianquan If we use safe_reload - time to time on some DUTs we have failed first test case. For me it looks like DUT is not ready but we started to send traffic - as result traffic failed. When I removed argument "safe_reload" - tests passing.

Hi @ppikh, can you paste error log?
Flag safe_reload runs well on our physical testbeds.
There's a pr merged recently for test_copp's error log issue, not sure whether it's related to your thing?
#5642

ppikh · 2022-05-16T10:06:29Z

Hi @yejianquan

Please see error log below

ERROR    tests.ptf_runner:ptf_runner.py:63 Exception caught while executing case: copp_tests.ARPTest. Error message: Traceback (most recent call last):
  File "/root/mars/workspace/sonic-mgmt/tests/ptf_runner.py", line 56, in ptf_runner
    result = host.shell(cmd, chdir="/root", module_ignore_errors=module_ignore_errors)
  File "/root/mars/workspace/sonic-mgmt/tests/common/devices/base.py", line 89, in _run
    raise RunAnsibleModuleFail("run module {} failed".format(self.module_name), res)
RunAnsibleModuleFail: run module shell failed, Ansible Results =>
{
    "changed": true, 
    "cmd": "ptf --test-dir ptftests copp_tests.ARPTest --qlen=100000 --platform nn -t 'peerip=u'\"'\"'10.0.0.22'\"'\"';verbose=False;myip=u'\"'\"'10.0.0.23'\"'\"';send_rate_limit=2000;target_port=11;has_trap=True' --device-socket 0-11@tcp://127.0.0.1:10900  --device-socket 1-11@tcp://10.210.24.22:10900", 
    "delta": "0:00:30.399052", 
    "end": "2022-05-13 03:12:35.341899", 
    "failed": true, 
    "invocation": {
        "module_args": {
            "_raw_params": "ptf --test-dir ptftests copp_tests.ARPTest --qlen=100000 --platform nn -t 'peerip=u'\"'\"'10.0.0.22'\"'\"';verbose=False;myip=u'\"'\"'10.0.0.23'\"'\"';send_rate_limit=2000;target_port=11;has_trap=True' --device-socket 0-11@tcp://127.0.0.1:10900  --device-socket 1-11@tcp://10.210.24.22:10900", 
            "_uses_shell": true, 
            "argv": null, 
            "chdir": "/root", 
            "creates": null, 
            "executable": null, 
            "removes": null, 
            "stdin": null, 
            "stdin_add_newline": true, 
            "strip_empty_ends": true, 
            "warn": true
        }
    }, 
    "msg": "non-zero return code", 
    "rc": 1, 
    "start": "2022-05-13 03:12:04.942847", 
    "stderr": "/usr/local/lib/python2.7/dist-packages/paramiko/transport.py:33: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.\n  from cryptography.hazmat.backends import default_backend\ncopp_tests.ARPTest ... FAIL\n\n======================================================================\nFAIL: copp_tests.ARPTest\n----------------------------------------------------------------------\nTraceback (most recent call last):\n  File \"ptftests/copp_tests.py\", line 267, in runTest\n    self.run_suite()\n  File \"ptftests/copp_tests.py\", line 201, in run_suite\n    self.one_port_test(self.target_port)\n  File \"ptftests/copp_tests.py\", line 196, in one_port_test\n    self.check_constraints(send_count, recv_count, time_delta_ms, rx_pps)\n  File \"ptftests/copp_tests.py\", line 257, in check_constraints\n    assert self.PPS_LIMIT_MIN <= rx_pps <= self.PPS_LIMIT_MAX, \"rx_pps {}\".format(rx_pps)\nAssertionError: rx_pps 0\n\n----------------------------------------------------------------------\nRan 1 test in 29.078s\n\nFAILED (failures=1)", 
    "stderr_lines": [
        "/usr/local/lib/python2.7/dist-packages/paramiko/transport.py:33: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.", 
        "  from cryptography.hazmat.backends import default_backend", 
        "copp_tests.ARPTest ... FAIL", 
        "", 
        "======================================================================", 
        "FAIL: copp_tests.ARPTest", 
        "----------------------------------------------------------------------", 
        "Traceback (most recent call last):", 
        "  File \"ptftests/copp_tests.py\", line 267, in runTest", 
        "    self.run_suite()", 
        "  File \"ptftests/copp_tests.py\", line 201, in run_suite", 
        "    self.one_port_test(self.target_port)", 
        "  File \"ptftests/copp_tests.py\", line 196, in one_port_test", 
        "    self.check_constraints(send_count, recv_count, time_delta_ms, rx_pps)", 
        "  File \"ptftests/copp_tests.py\", line 257, in check_constraints", 
        "    assert self.PPS_LIMIT_MIN <= rx_pps <= self.PPS_LIMIT_MAX, \"rx_pps {}\".format(rx_pps)", 
        "AssertionError: rx_pps 0", 
        "", 
        "----------------------------------------------------------------------", 
        "Ran 1 test in 29.078s", 
        "", 
        "FAILED (failures=1)"
    ], 
    "stdout": "", 
    "stdout_lines": []
}

ERROR    root:__init__.py:40 Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/_pytest/python.py", line 1464, in runtest
    self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  File "/usr/local/lib/python2.7/dist-packages/pluggy/hooks.py", line 286, in __call__
    return self._hookexec(self, self.get_hookimpls(), kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pluggy/manager.py", line 93, in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pluggy/manager.py", line 87, in <lambda>
    firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
  File "/usr/local/lib/python2.7/dist-packages/pluggy/callers.py", line 208, in _multicall
    return outcome.get_result()
  File "/usr/local/lib/python2.7/dist-packages/pluggy/callers.py", line 81, in get_result
    _reraise(*ex)  # noqa
  File "/usr/local/lib/python2.7/dist-packages/pluggy/callers.py", line 187, in _multicall
    res = hook_impl.function(*args)
  File "/usr/local/lib/python2.7/dist-packages/_pytest/python.py", line 174, in pytest_pyfunc_call
    testfunction(**testargs)
  File "/root/mars/workspace/sonic-mgmt/tests/copp/test_copp.py", line 89, in test_policer
    dut_type)
  File "/root/mars/workspace/sonic-mgmt/tests/copp/test_copp.py", line 287, in _copp_runner
    device_sockets=device_sockets)
  File "/root/mars/workspace/sonic-mgmt/tests/ptf_runner.py", line 64, in ptf_runner
    raise Exception
Exception

yejianquan · 2022-05-17T03:43:15Z

Hi @saiarcot895
I checked previous test results In April (since the safe_reload flag is introduced at the end of April), test_copp works fine without safe_reload=True in the past,
Did you find any failed test_copp runs that indicate test_copp needs safe_reload=True?
If not, I prefer to merge this pr and observe test results in the future.
but I still don't understand, safe_reload should only make things better.

saiarcot895 · 2022-05-17T06:50:45Z

Hi @yejianquan, I added safe_reload here not because anything was failing, but as a preventative measure, to make sure that everything was up and running before proceeding.

One issue I can see is that if safe reload is enabled, then instead of waiting for a fixed 120 seconds (the old behavior), it may wait anywhere between 0 and 420 seconds (300 seconds added as a buffer for services to come up), which means it may end up waiting for less time if the services happen to be up by then. However, if swss isn't ready when services are reported to be up, maybe there needs to be some other check happening, to make sure everything is ready and any ASIC programming is done.

For now, you can go ahead and merge this PR.

sanmalho-git · 2022-05-17T14:04:12Z

FYI - We have a PR #5610 open to add for checking if the interfaces are up as an optional action as part of config reload.

saiarcot895 · 2022-05-17T16:45:47Z

If that PR works consistently, then maybe that PR can be merged, and this one can be closed?

liat-grozovik · 2022-05-19T14:48:11Z

@saiarcot895 do you mean to close this PR and leave #5610
the other one is bigger change and does not seems to be approved.
I suggest that in order to have a working test we will have this PR first and any additional improvements as suggested by 5610 will be done on top of it. what do you say?

liat-grozovik · 2022-05-25T15:53:44Z

@saiarcot895 kindly reminder

saiarcot895 · 2022-05-26T02:49:23Z

I reviewed and merged #5610, as I feel that that might fix the issue seen here. Could you test with those changes and see if the issue is still present?

ppikh · 2022-05-31T14:20:59Z

Hi @saiarcot895 - it looks like PR #5610 fixed issue(I did run few times and issue does not reproduce).
I think current PR can be closed.

Fixed COPP test - reload logic

49925ff

Old logic caused time to time failures in first test in class due to traffic loss Signed-off-by: Petro Pikh <[email protected]>

ppikh requested a review from a team as a code owner May 13, 2022 14:19

dgsudharsan approved these changes May 17, 2022

View reviewed changes

liat-grozovik approved these changes May 19, 2022

View reviewed changes

liat-grozovik added the Bug 🐛 label May 19, 2022

ppikh closed this May 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[COPP] Fixed COPP test - reload logic #5651

[COPP] Fixed COPP test - reload logic #5651

ppikh commented May 13, 2022

liat-grozovik commented May 14, 2022

azure-pipelines bot commented May 14, 2022

liat-grozovik commented May 14, 2022

yejianquan commented May 16, 2022

ppikh commented May 16, 2022

yejianquan commented May 16, 2022

ppikh commented May 16, 2022

yejianquan commented May 17, 2022

saiarcot895 commented May 17, 2022

sanmalho-git commented May 17, 2022

saiarcot895 commented May 17, 2022

liat-grozovik commented May 19, 2022

liat-grozovik commented May 25, 2022

saiarcot895 commented May 26, 2022

ppikh commented May 31, 2022

[COPP] Fixed COPP test - reload logic #5651

[COPP] Fixed COPP test - reload logic #5651

Conversation

ppikh commented May 13, 2022

Description of PR

Type of change

Back port request

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

liat-grozovik commented May 14, 2022

azure-pipelines bot commented May 14, 2022

liat-grozovik commented May 14, 2022

yejianquan commented May 16, 2022

ppikh commented May 16, 2022

yejianquan commented May 16, 2022

ppikh commented May 16, 2022

yejianquan commented May 17, 2022

saiarcot895 commented May 17, 2022

sanmalho-git commented May 17, 2022

saiarcot895 commented May 17, 2022

liat-grozovik commented May 19, 2022

liat-grozovik commented May 25, 2022

saiarcot895 commented May 26, 2022

ppikh commented May 31, 2022